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Abstract 

It is well-known that linear dynamical systems with Gaussian noise and quadratic 
cost (LQG) satisfy a separation principle. Finding the optimal controller amounts 
to solving separate dual problems; one for control and one for estimation. For the 
discrete-time finite-horizon case, each problem is a simple forward or backward recur- 
sion. In this paper, we consider a generalization of the LQG problem in which there 
are two controllers. Each controller is responsible for one of two system inputs, but has 
access to different subsets of the available measurements. Our paper has three main 
contributions. First, we prove a fundamental structural result: sufficient statistics for 
the controllers can be expressed as conditional means of the global state. Second, we 
give explicit state-space formulae for the optimal controller. These formulae are remi- 
niscent of the classical LQG solution with dual forward and backward recursions, but 
with the important difference that they are intricately coupled. Lastly, we show how 
these recursions can be solved efficiently, with computational complexity comparable 
to that of the centralized problem. 

1 Introduction 

With the advent of large systems operating on a global scale such as the internet or power 
networks, the past decade has seen a resurgence of interest in decentralized control. For 
such large systems, it is inevitable that some control decisions must be made using only 
local or partial information. Two natural questions that arise are: 

1. Can the ever-growing information history be aggregated without compromising 
achievable performance? In other words, what are sufficient statistics for the decision- 
makers? 

2. When and how can optimal decentralized policies be efficiently computed? 

In this paper, we give complete answers to the above questions for a fundamental decen- 
tralized control problem: the two-player problem. 

Briefly, the two-player problem consists of two systems with their own local controllers. 
The systems are coupled. System 1 affects System 2 through its state and input but not 
vice- versa, and the controller for System 1 shares its measurement with the controller for 
System 2 but not vice versa. A formal description of the problem is given in Section 3. 

It is believed that decentralized control problems are likely hard in general [2, 14]. 
However, the two-player problem is partially- nested [4], and thus admits an optimal 
controller that is linear. In this sense, it is one of the simplest problems that still captures 
the essential features that makes decentralized control difficult. 



1 



Our paper has three main contributions. In Section 4, wc find sufficient statistics for the 
two-player problem under LQG assumptions. To our knowledge, these results are new. We 
also comment on possible generalizations if we relax the LQG assumptions and consider 
a general partially-observed Markov decision process (POMDP). Related results include 
networked Markov decision processes with delays [1] and broadcast structures [15]. Our 
paper uses a common information approach, which was first developed to solve a related 
class of problems with partial history sharing [11]. 

In Section 5, we give an explicit state-space solution to the two-player problem us- 
ing dynamic programming. Related approaches were used to solve this problem in the 
noise-free case, with exact state measurements. Existing solutions include sparsity [13], 
delays [6], or a mixture of both [7]. As we shall see, having noisy measurements intro- 
duces a nontrivial coupling between estimation and control and complicates the solution 
significantly. 

An explicit solution to the continuous-time version of the two-player problem as well 
as an extension to the broadcast case appeared in [8, 9, 10]. These works use a spec- 
tral factorization approach that is completely different from the common information 
approach used herein. Furthermore, they solve the problem over an infinite time horizon, 
which makes the coupling between estimation and control simpler due to the steady-state 
assumption. 

Lastly, in Section 6, we show how to efficiently compute the solution to the two-player 
problem, and show that it can be done with computational effort comparable to that 
required for the centralized version of the problem. Namely, computational effort is 
proportional to the length of the time horizon. 

2 Notation 

Real vectors and matrices are represented by lower- and upper-case letters respectively. 
Boldface symbols denote random vectors, and their non-boldface counterparts denote 
particular instantiations. The probability density function of x evaluated at x is denoted 
P(x = x), and conditional densities are written as P(x | y = y). We write x = Af(n, £) 
when x is normally distributed with mean \i and variance X. In other words, the proba- 
bility density function of x has the form 



for some a > 0. This paper considers stochastic processes in discrete time over a finite 
time interval [0, T]. Time is indicated using subscripts, and we use the colon notation to 
denote ranges. For example: 



In general, all symbols are time-varying. In an effort to present general results while 
keeping equations clear and concise, we introduce a new notation to represent a family of 
equations. For example, when we write: 



We mean that x t+ i = A t x t + w t holds for < t < T — 1. Note that the subscript "+" 
indicates that we increment to t + 1 for the associated symbol. We similarly overload the 
summation symbol by writing for example 



P(x = x) = acxp {-\{x- / u) T E _1 (a; 



/*)) 



XQ-.T-l = {XQ,XI, . . . ,Xr-l} 



x + = Ax + w 



T-l 




to mean 




Any time we use t above a binary relation or below a summation, it is implied that 
< t < T— 1. There is no ambiguity because we use the same time horizon T throughout 
this paper. 

We denote submatrices and subvectors by using subscripts, but we will often use super- 
scripts instead to avoid clutter. For example, we write P 21 and x\. t to avoid writing P2i,+ 
and Xifi-.t respectively. We also introduce a partition of the identity matrix I = [E\ E2] 
where the dimensions are inferred by context. For example, if 




then EjAEx = A 2 i and EjA = A n Ej . 



3 Problem statement 



Consider two interconnected linear systems with the following state update and measure- 
ment equations. 
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The random vectors in the collection 
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(2) 



arc mutually independent and jointly Gaussian with the following known probability 
density functions. 

X = W(Minit,£init) 



N\ 



W U T 

U V 



(3) 



There are two controllers, and the information available to each controller at time t is 



i* = {yJ:t-n u 0:t-i} 

it = {yO:t-l:yO:t-l: U 0:i-l: U 0:t-l} 



(4) 



The controllers select actions according to control strategies /' := (/q, /{, • • ■ , fx-i) f° r 
i = l,2. That is, 



u, 1 = fl(it) and u 2 - / t 2 ( i t ) for < t < T - 1 



(5) 



The performance of control strategies f 1 , f 2 is measured by the finite horizon expected 
quadratic cost given by 



Mf 1 J 2 ) = E flj2 E 



Q S' 
S T R 



(6) 



The expectation is taken with respect to the joint probability measure on (x 0: t, u 0: t-i) 
induced by the choice of f 1 and / 2 . We are interested in the following problem. 
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Problem 1 (Two-Player LQG). For the model (l)-(5), find control strategies J 1 ,/ 2 
that minimize the cost (6). 



A related and well-known problem is the centralized LQG problem. It is the special 
case of the two-player problem for which there is a single decision-maker. 



Problem 2 (Centralized LQG). Consider the state update and measurement equations 

x + = Ax + Bx + w 
y = Cx + v 

are mutually independent and jointly Gaussian 



Suppose that ^x , 
with distributions: 



w 
v 



Wt_i 
Vt-1 



X = -A/Xliinit, £ init ) 

w U T 



= M 0, 



U V 



Suppose u t = ft(h), where i t := (y 0:t _i, u 0:t -i), and our goal is to choose f := f 0:T -i 
such that we minimize the expected quadratic cost 



J (/)=W£ 



Q S 
S T R 



X^PfinalXT 



The expectation is with respect to the joint probability measure on (xo : t> Uo : t-i) induced 
by the choice of f . 



In both Problem 1 and Problem 2, the sizes of the various matrices and vectors 
may also vary with time. It is assumed that |itinit> ^init) -Pfinai) a s well as the values of 
A, B, C, Q, R, S, U, V, W for all t, arc available to all decision-makers for any t > 0. 

We also clarify that while we often call the decision-making agents players, this is not a 
game. The players are cooperative and their strategies are to be jointly optimized. Thus 
at time t, Player 1 does not know Player 2's private information {yo:t-i> u o-.t-i}i but does 
know how Player 2 plans to use its information, the strategy / t 2 . 



4 Structural results 



In Problem 1, the lower triangular nature of the state, control and observation matrices 
of (1) implies that Player l's state and control actions affect Player 2's information but 
not vice versa. Further, any information available of Player 1 is also available to Player 2. 
Hence, Problem 1 is partially nested and the optimal strategies for the two players are 
linear functions of their respective information histories [4, Thm. 2] . 

In this section, we show that the information histories can be aggregated into sufficient 
statistics. In Sections 5 and 6, we will use this fact to derive a recursive hnite-memory 
implementation of the optimal controller. We start with a well-known structural result 
for the centralized LQG problem (Problem 2). 

Lemma 3. In Problem 2, there exists an optimal control strategy of the form 

u = Kz 

where z t := E(x t | i t = i t ), and K 0: t-i are fixed matrices of appropriate dimensions. 
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4.1 Structural result for Player 2 



We now turn our attention to Problem 1. Consider any arbitrary linear strategy for 
Player 1. Thus, Player l's control actions are of the form 

„.i ± 



Gi 



(7) 



where G 0: t-i are fixed matrices of appropriate dimensions and i t is the realization of 
Player l's information. Given this strategy for Player 1, we want to find the optimal 
strategy for Player 2. 

In the next two results, we show that once Player l's strategy is fixed, finding the 
optimal strategy for Player 2 amounts to solving a centralized LQG problem. Thus we 
may apply the structural result presented in Lemma 3. 

Lemma 4. Consider Problem 1, and assume any fixed strategy for Player 1 given by (7). 
Define x t as follows. 

x t := for < t < T 

Li* J " " 

Then, the following statements are true, 
(i) There exist matrices A t , B t ,C t , D t such that 

X = -AAOtiini^Einit) 



x+ = Ax + Bu 2 + D 
y = Cx + v 



(ii) There exist matrices Q t , Rt, S t , Pfinai such that the total expected cost can be written 
as 

T r . 

E ' ' " 



u 



Q 



u 



+ X-P final X T 



Proof. The proof follows from the definition of x t , the state, observation, and cost 
equations of Problem 1, and the fixed strategy for Player 1 given by (7). ■ 

Theorem 5. Consider Problem 1. For any choice of Player l's strategy, the optimal 
strategy for Player 2 has the structure 

u\ Lh 1 ^ H 2 z (8) 

where z t := E(x t | i t — i t ) and i t C it is the realization of Player 1 's information. 

Proof. Lemma 4 implies that when Player l's strategy is fixed, the optimization problem 
for Player 2 is an instance of the centralized LQG problem (Problem 2) with x t as the 
state of the linear system, y t as the observation, and u^ as the control action. Therefore, 
by Lemma 3, the optimal strategy for Player 2 is of the form u\ = H t E(x t | i t — it) for 
some matrix H t . Further, 



E(x t | i t = i t ) = 



"E(x t | i t = h) 




Zt 


E{i t \i t = it) 




H 



(9) 



where we used the fact that Player l's information, i t , is a subset of Player 2's information, 
i t . Therefore, the optimal strategy for Player 2 is of the form u 2 = H t E(x t \ U = it) — 
H\ i t + Hfz t , as required. ■ 
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4.2 Joint structural result 



Theorem 5 implies that Player 2's optimal action can be written as 



2 * ~2 , tt2 

jl = u + Hz 



(10) 



where uf is a linear function of the information of Player 1. Note that both uf and uj. are 
(linear) functions of the same information. In order to further characterize the structure 
of optimal strategies, we consider a coordinated system where a coordinator knows the 
common information among the players (that is, it) and selects both uf and u\ based 
on this common information. Once the coordinator selects uf, Player 2's control action 
is uf 



Hfz t + uf, for some Hf. It is clear that any strategy of the form (10) can be 



implemented in the coordinated system. 

Given an arbitrary choice of Hf, we want to find the optimal strategy for the coordi- 
nator. As in Lemma 4, this can be formulated as a centralized LQG problem. 

Lemma 6. Consider Problem 1 where uf is given by (10), and assume any fixed choice 
of H? . Define x t as follows. 



x t := 



for < t < T 



Then, the following statements are true. 

(i) There exist matrices A t , B t , D t , S in i t and a vector /x init such that 

X = W(Minit,S in it) 



x + = Ax. + B 



r 1 = Cnx 1 +V 1 



u r 




w 


+ D 


u 2 




V 



(ii) There exist matrices fi t and fifmal suc h that the total expected cost can be written as 






T 




X 




X 


u 1 




u 1 








u 2 




u 2 



+ Xy fifinal X T 



Proof. The proof follows from the definition of x t , the state, observation, and cost 
equations of Problem 1, and the fact that Zt has a linear update equation. This linear 
update equation is the standard Kalman filter, given explicitly in (16)— (17). ■ 



Theorem 7. The optimal strategies for the two players in Problem 1 are of the form 

u\ = G t z t u\ = H\zt + Hfz t (11) 
where z t := E(x t | i t = i t ) and z t := E(x t | i t = i t ). 

Proof. Lemma 6 implies that when the matrices Hf in (10) are fixed, the optimization 
problem for the coordinator is an instance of Problem 2 with x t as the state of the linear 
system and (uj , uf) as the control action. By Lemma 3, we obtain 

= H t E(x t |i t =i t ), (12) 
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for some matrix H t . Furthermore, 



H t E(Sc t \i t = i t ) = H t 



E(x t | i t = i t ) 
E(z t |i t = i t ) 



z t 

E(E(x t |i t ) 





Zt 








Zt 



(13) 



where we used the smoothing property of conditional expectations and the fact that 
it C if. Therefore, u\ and are linear functions of Player l's estimate. ■ 

Theorem 7 shows that for Problem 1, a sufficient statistic is the set of conditional means 
(zt,z t ). Note that for a given realization of i t , Player 2's estimate z t does not depend on 
players' strategies. However, for a given realization of i t , Player l's estimate z t depends 
on the choice of matrices Hq. 1 ._ 1 . 



4.3 Extension to the POMDP case 

In the centralized LQG problem (Problem 2), the result that optimal control actions 
are functions of the controller's conditional mean of the state is a consequence of the 
linear, Gaussian nature of the system. If the state update and measurement equations 
are nonlinear with non-Gaussian noise, the centralized problem becomes a POMDP. For 
POMDPs, optimal actions are functions of controller's conditional probability density of 
the state and not just the conditional mean. In other words: 

LQG: u t = K t z t where z t = E(x t | i t = i t ) 

POMDP: u t = 4>t(Kt) where 7r t = P(x t | i t = i t ) 

where 4> t is a (possibly) nonlinear function. We call 7r t the belief state. Note that ir t is a 
probability density function while z t is simply a real vector. 

For the two-player problem (Problem 1), the simple form of the structural results 
obtained in Theorems 5 and 7 is also a consequence of the information structure of the 
two player problem and the linear, Gaussian nature of the system. It is interesting to 
derive the corresponding POMDP result nonlinear systems with non-Gaussian noise but 
with the same two-player information structure. 

A straightforward extension of Lemma 4 shows that for any choice of Player l's strategy, 
Player 2's optimization problem is a POMDP and the optimal strategy for Player 2 has 
the structural form: 

u t = Jt(nt,it) 

where jt is a (possibly) nonlinear function. With this structural result for Player 2, we can 
try to retrace the coordinator-based arguments of Section 4.2. Because Player 2's optimal 
policy is no longer linear in this case, the coordinator's problem is more complicated. The 
coordinator still knows Player l's information, but it is now required to select a control 
action for Player 1 and a function that maps Player 2's belief ir t to Player 2's action. The 
coordinator's problem can be viewed as a POMDP with (x^-n^) as the state. Therefore, 
the structural result for the coordinator's problem involves a belief on the pair (x t ,TTt). 
Not only is the coordinator required to keep a belief on the state, it is also required to 
keep a belief on Player 2 's belief on the state. 

As shown in Section 4, the structures of the optimal strategies for the centralized and 
two-player problems in the LQG case are of comparable complexity. However, this is 
not the case for the nonlinear, non-Gaussian versions of these problems. While both 
problems can be cast as POMDPs, the centralized case requires maintaining a belief on 
the system state, while the two-player case requires maintaining a belief on a belief. This 
is substantially more complicated object. 
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5 Explicit solution 

In this section, we use the structural results of Section 4 to derive an explicit and efficiently 
computable state-space realization for the optimal controller for Problem 1. 

To ensure a unique optimal controller with a recursively computable structure, we make 
some additional mild assumptions, which we list below. 

Main assumptions. We assume the following. 

S init > 0, and V > (14) 

■Pfinal > 0, and R > (15) 

The assumptions that V t > and R t > are made for simplicity and can generally be 
relaxed. For example, it is only required that C t T, t Cj + V t > 0, so as long as this holds, 
we can have V t > 0. 

The well-known solution to the centralized LQG problem (Problem 2) in given in the 
following lemma. 

Lemma 8. Consider Problem 2 and suppose the main assumptions (14) -(15) hold. The 
optimal policy is 

Zo = Minit 

z+ = Az + Bu - L(y - Cz) (16) 
u = Kz 

where Lo:T-i satisfies the forward recursion 

So = Si n it 

£+ = ASA T + L(CY,A T + U) + W (17) 
L = -(A£C T + U T )(CY,C T + V)- 1 

and Kq : t-i satisfies the backward recursion 

P T = Pfinal 

P = A J P+A + (A T P+FJ + S)K + Q (18) 
K = -(B T P+B + R)- 1 (B T P+A + S T ) 

For every t, the belief state has the distribution 

P(x t |i t = i t )=JV(« t ,E t ) (19) 

and the optimal average cost is given by 

Jo = ^nit^oMinit + tr(P S init ) +Y / (^(P+W) + tv^K 1 [B 1 P+B + R)K] ) (20) 

t 

Proof. Sec for example [5] or [12]. ■ 

Note that Lemma 8 holds in great generality. All system, cost, and covariance matrices 
may vary with time. The above formulae hold even in the case where the dimensions of 
the matrices are different at every timestep. 
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The main result of this section is a state-space solution to Problem 1, the two-player 
problem. The result, given below in Theorem 9, is similar in structure and generality to 
Lemma 8, but with one important difference. In Lemma 8, the gains Kq : t-i and Lq.t-i 
can be computed separately using different recursions. Thus, Lemma 8 provides both a so- 
lution and a recipe for its construction. In contrast, the recursions for Kq-t-i and Lq-t-i 
found in Theorem 9 are coupled. Therefore, Theorem 9 provides an implicitly defined 
solution, but no obvious construction method. In Section 6, we make our solution explicit 
by showing how the various gains found in Theorem 9 can be efficiently computed. 

Theorem 9. Consider Problem 1 and suppose the main assumptions (14) -(15) hold. The 
optimal policy is 

z+ = Az + Bu - L(y - Cz) (21) 
u = Kz 

z+= Az + Bu- L(y - Cz) (22) 
u = Kz + K(z- z) 

where L :T-i and K 0: t-i satisfy (17)-(18). If we define A := A + BK + LC, then L 0: t-i 
satisfies the recursion 

£+ = £+ + i(S - S)i T + (L - L)(CEC T + V)(L - L) T (23) 
L = -(AtC J + U T + BK{t - £)<7 T )£i(C'i 1 i: 11 C 1 "i + Vn)" 1 ^ 

and Kq-.t-i satisfies the recursion 

Pt — -Pfinal 

P = P + i T (P+ - P+)A + {K- K) T (B T P + B + R){K - K) (24) 
K L -E 2 (Bj 2 Pl 2 B 22 + R 22 )- 1 Ej(B T P + A + S T + B T (P+ - P+)LC) 

For every t, the belief states have the distributions 



F(x t \i t = i t )=M(z t ,£ t ) 
F(^ t \i t = i t )=Af(z t ,E t ) 



(25) 



and the optimal average cost is given by 

Jo = MLt-PoMinit + tr(P S init ) + J2 ( tr(P+W) + tr[ZK T (B T P+B + R)K] 

t 

+ tr [(S - E) (K - K) T (B T P+B + R)(K - K)] ) (26) 



Proof. The result of Theorem 5 implies that one may choose an optimal policy for 
Problem 1 of the form 

u = u + Kz (27) 

where u is a function of it and k t is a matrix whose first block-row is zero. Our first step 
will be to fix k t and to solve for the optimal u. We begin by computing the estimators 
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defined as 



zt = E(x t | i t = i t ) 
z t = E(x t | i t = i t ) 

Computing zt amounts to solving a standard centralized Kalman filtering problem. By 
Lemma 8, z t satisfies (16) where L 0: t-i is given by (17). To compute z t , we construct 
an equivalent centralized estimation problem and appeal once again to Lemma 8. Notice 
that 



z t = E(E(x t I i t ) | ij = It) = E(z t | it - it) 



So we may estimate x t by estimating z t instead. Substituting the definitions for u t and 
y t into (1) and (22), we obtain the state and measurement equations 

A + BK -LC 
A + LC 





t 









z 




B 




-Lv 






+ 





u + 


w + Lv 




e 





Where we have defined the error signal e := x — z. Apply Lemma 8 to compute the 
E-recursion. A straightforward induction argument shows that the covariance and gain 
matrices that satisfy (17) at time t are given by 











and 



LtEi 




where E t and L t satisfy (23). The block-diagonal form of the covariance matrix veri- 
fies that Zt and e t are conditionally independent given i t . Computing the estimation 
equations (16), we find that the estimate of e t is 0, and 



z + = Ai + Bu — Ly 
State and input split into conditionally independent parts 



(28) 



X 


t 


z 




e 








+ 





u 




u + kz 



so the only relevant part of the cost (6) is 

E (E r 



z 

u + Kz 



Q 



z 

u + kz 



+ ZT-PfinalZT 



Applying the P-recursion (18) from Lemma 8 to solve for u, we find after some algebra 
that 

u=(K-K)z (29) 

where K 0: t-i is the centralized gain given by (18). Substituting (29) into (28), we recover 
the desired form for Player l's estimator (21). 

We have shown thus far that for any k t , the optimal estimators have the form (21)-(22), 
where (17), (23), and (25) hold. In particular, the optimal input is 

t 



Kz + K(z-z) 



(30) 



where K t is given by (18). Our final step is to solve for the optimal k t . Rewrite the 
input (30) as 

u — Kz + E 2 u 
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where we have collected all the K t dependency into the unknown u t , and used the fact 
that the first block-row of K t is zero. We will solve the relaxed problem of finding u t 
rather than solving for K t , and show that the relaxation is exact. Gathering the state 
equations (1) and the estimator equations (21)-(22), we obtain 



A+LC 

x 
e 





t 


'A+BK 












x 




BE2 




w 




e 


+ 


BE 2 


u + 


w + Lv 



y=[c o] 



where we have defined the error signal e := x — z. The cost (6) is given by 



E E 



X 

u + Kz 



-, T 



Q S' 
S T R 



x 

u + Kz 



+ XyPfi nal X T ^ 



where the correct coordinates can be obtained by by substituting z = x e. Now apply 
Lemma 8 to compute the P-recursion. A straightforward induction argument shows that 
the cost-to-go and gain matrices that satisfy (18) at time t are given by 



Pt 
P t - Pt 



and [0 EjK t ] 



where P t and K t satisfy (24). It follows from Lemma 8 that the optimal input is 



u t = [0 EjK t 



E(x t I i t = i t ) 
E(e t I i t = it) 
= EjK t E(^ t -z t \i t =i t ) 
= EjK t {z t -zt) 

Despite allowing u t to depend on the full measurement history i t , we find that it only 
depends on (z t — z t ), so the relaxation is exact and the proof is complete. 

The dynamic programming recursion above also yields an expression for the optimal 
expected cost. After further algebraic manipulations, which we omit in the interest of 
space, we obtain (26). ■ 



6 Efficient computation 

Theorem 9 provides a state-space realization for the two-player problem similar to 
Lemma 8, with an important difference. In Lemma 8, the recursions (17) and (18) can 
be solved independently by propagating time forward or backward respectively. However, 
the recursions (23)-(24) are coupled in an intricate way. Both P and S recursions con- 
tain A, which depends on K and L. Furthermore, the equations for K contains L and 
vice- versa. 

Despite being nonlinear difference equations coupled across all timesteps, the recur- 
sions (23)-(24) can be solved efficiently. In the following theorem, we show that the 
equations for E, P, L, K can be reduced to a linear two-point boundary-value problem 
and thereby solved as efficiently as (17)-(18). 
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Theorem 10. In the solution to Problem 1 given by (21)-(22), the gains L 0:T _i, K 0:T _\ 
are of the form 



L = 



M 
L 21 



and 



k± 





K 21 J 



where M 0: t-i satisfies the forward recursion 



ill 



T+ = A^TAj, + M(CnTAj 1 + Uu) + W u 
M I -(AnTCj, + U^CnTCj, + Vn)' 1 

and Jq.t-i satisfies the backward recursion 



(31) 



Ft = Pi 
t 



22 



F = A J 22 F+A 22 + (Al 2 F+B 22 + S 22 )J + Q 22 
J = — (B 22 F + B 22 + R 22 ) 1 (B 22 F + A 22 + S 22 ) 



(32) 



Finally, Sq T , ij T , Pq}t-\> Kq-t-i sa ^ s /2/ the coupled forward and backward recursions 



v2i _ V 21 
^0 — ^init 

^21 1 



= AjY? x A\ :l + B 22 K 21 (T - z~: n )A T M + (A 21 T - B 22 JzZ 21 )A T M + Uj 2 M T + W 21 
Ajt 21 C^ + B 22 K 21 (T - E 11 )Cj 1 + (A 21 T - B^JY? 1 )^ + Uj 2 

x (CnTCji + Vn)- 1 



L 



21 « 



(33) 



p21 _ P 21 
r T — Minal 



p21 I A Tp21 AM + A T {F+ _ p22 ) ^21 Cii + _ P f M C lX ) + J J Sj 2 + Q 21 

K 21 = —(B 2 V 2 F + B 22 + R 22 ) 1 

x BJ 2 P 21 Am + Bj 2 (F+ - Pl 2 )L 21 C n + Sj 2 + B T 22 {F+A 21 - P^MCn) 

(34) 



where A M ■= An + MCn and Aj := A 22 + B 22 J . 



Proof. This result follows from Theorem 9 and some straightforward algebra, so we omit 
the details. The recursions (31) and (32) are obtained by simplifying the 11 block of (23) 
and the 22 block of (24), respectively. Finally, the recursions (33) and (34) are obtained 
by simplifying the 21 blocks of (23) and (24) respectively. ■ 

Theorem 10 reduces the coupled recursions found in Theorem 9 to a two-point linear 
boundary value problem. From a computational standpoint, computing the matrices 
L t , M t , K t , Jt using (17)-(18) and (31)-(32) requires recursing through the entire time 
horizon. This requires 0(T) operations. 

It turns out that T, 21 , P 21 , L 21 , K 21 (and consequently L t and K t ) can also be computed 
in 0(T). To see why, note that (33)-(34) are of the form 

V21 _ v 21 f)21 _ p21 

^0 — ^init r T — Minal 

t 2 ;L gi (t 2 \k 2 ^) P 21 lg 2 (P 2 + \L 21 ) (35) 

L^Lg,(± 2 \k 2 ^ k 21 l 94 (Pl\L^) 
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where gi, . . . , g 4 are affine functions. Eliminating L 21 and K 21 from (35) using the last 
row of equations, 

^init 

^21 L u /"^21 d21n 



f.21 
^0 



p21 _ p21 
r T — -"final 



= h^ 1 , Pf) P 21 = h 2 (± 2 \ Pf) 

where h\ and h 2 are affine functions. Now let 



(36) 



vecP 21 
vec Si 1 



where vecl is the vector found by stacking the columns of X. Then (36) is a block- 
tridiagonal system of the form 



G 1 



#1 
G 2 



H 2 
I 



G 



T-l 



I 







Vo 




co 






m 




Cl 






m 




C2 


1 














JlT-l_ 




_C T -1_ 



for some constant matrices G\-t-i and £fi : T-i & n d a constant vector Co : t-i- Equations 
of this form can be solved in 0(T) using for example block tridiagonal LU factorization [3, 
§ 4.5.1]. 

Therefore, the optimal controller for the two-player problem presented in Theorem 9 
can be computed with comparable effort to its centralized counterpart in Lemma 8. 

Note that the infinite-horizon two-player problem can be solved by making suitable 
assumptions on the system parameters and taking limits in Theorem 10. The recur- 
sions (17)— (18) and (31)-(32) become algebraic Riccati equations, and the coupled recur- 
sions (35) become a small set of linear equations. 



7 Concluding remarks 

In this paper, we used a coordinator-based approach to derive a new structural result 
for the two-player problem, a fundamental problem in decentralized control. Our results 
generalize those from classical LQG theory in a very intuitive way. Rather than main- 
taining a single estimate of the state, two separate estimates must be maintained, to 
account for the two different sets of information available. As in the centralized case, 
finding the optimal two-player controller requires solving forward and backward recur- 
sions for estimation and control respectively. The key difference is that the recursions 
for the two-player case are coupled and must be solved together. Finally, we show that 
these recursions can be solved as efficiently as in the centralized case, with computational 
complexity proportional to the length of the time horizon. 

An effort was made to express our results in a form that showcases the duality between 
estimation and control. This duality is apparent in (23)-(24), (31)-(34), and in the proof 
of Theorem 9. The extent of the symmetry and duality observed in the solution is perhaps 
unexpected given the information structure of the problem. Indeed, there seems to be 
a greater burden on the second player since he receives more measurements and must 
correct for the estimation errors inevitably made by the first player. However, from a 
different perspective, there also seems to be a greater burden on the first player since he 
has more control authority and must act to influence states of the system that the second 
player cannot control. Thus, the first player's lack of observability mirrors the second 
player's lack of controllability. 
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